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Abstract. Vector quantization(VQ) is a lossy data compression tech- 
nique from signal processing for which simple competitive learning is one 
standard method to quantize patterns from the input space. Extending 
competitive learning VQ to the domain of graphs results in competitive 
learning for quantizing input graphs. In this contribution, we propose 
an accelerated version of competitive learning graph quantization (GQ) 
without trading computational time against solution quality. For this, we 
lift graphs locally to vectors in order to avoid unnecessary calculations of 
intractable graph distances. In doing so, the accelerated version of com- 
petitive learning GQ gradually turns locally into a competitive learning 
VQ with increasing number of iterations. Empirical results show a sig- 
nificant speedup by maintaining a comparable solution quality. 

1 Introduction 

Vector quantization is a classical technique from signal processing suitable for 
lossy data compression, density estimation, and prototype-based clustering [10]. 
The aim of optimal vector quantizer design is to find a codebook of a given size k 
such that an expected distortion with respect to some (differentiable) distortion 
measure is minimized. Competitive learning VQ is one standard method for 
optimal vector quantizer design [5, 14, 30]. 

Graph quantization as formalized in [22] is a generalization of vector quan- 
tization to the quantization of combinatorial structures that can be represented 
by attributed graphs. The generalization from vectors to graphs opens applica- 
tions to diverse domains like proteomics, chemoinformatics, and computer vision, 
where the patterns to be quantized are more naturally represented by trees, lat- 
tices, and graphs. 

Designing optimal graph quantizers has been pursued as central clustering. 
Examples include competitive learning algorithms in the domain of graphs [12, 
13,18,21] and k-means as well as k-medoids algorithms [7,8,18,19,22,23,28, 
29]. A key problem of all these algorithms is that they are slow in practice 
for large datasets of graphs, because calculating the underlying graph distance 
(distortion) measure is a graph matching problem of exponential complexity. 
Given a training set of N graphs, competitive learning, k-means, and k-median 
calculate (or approximate) at least kN intractable graph distances during each 
cycle through the training set. 



In this contribution, we propose an accelerated version of competitive learn- 
ing GQ. We assume that the underlying distortion measure is a geometric graph 
metric, which arises in various different guises as a common choice of proximity 
measure in a number of applications [1-3, 11,31,33]. The proposed accelerated 
version of competitive learning GQ avoids unnecessary graph distance calcula- 
tions by exploiting metric properties and by lifting the graphs isometrically to 
an Euclidean vector space whenever possible. Lifting graphs to vectors reduces 
competitive learning GQ to competitive learning VQ locally. If lifting becomes 
unfeasible, we switch back to competitive learning GQ. By switching back and 
forth between competitive learning GQ and competitive learning VQ locally, the 
accelerated version of competitive learning GQ reduces gradually to the efficient 
competitive learning algorithm for VQ. 

The proposed accelerated version of competitive learning GQ has the follow- 
ing properties: First, it can be applied to finite combinatorial structures other 
than graphs like, for example, point patterns, sequences, trees, and hypergraphs. 
For the sake of concreteness, we restrict our attention exclusively to the domain 
of graphs. Second, any initialization method that can be used for competitive 
learning GQ can also be used for its accelerated version. Third, hierarchical 
methods for GQ benefit from the proposed accelerated scheme. Fourth, compet- 
itive learning GQ and its accelerated version perform comparable with respect 
to solution quality. Different solutions are caused by approximation errors of the 
graph matching algorithm and by multiple local minima but are not caused by 
the mechanisms to accelerate competitive learning. 

The paper is organizes as follows. Section 2 briefly describes competitive 
learning GQ. Section 3 proposes an accelerated version of competitive learning 
GQ. Experimental results are presented and discussed in Section 4. Finally, 
Section 5 concludes with a summary of the main results and future work. 

2 Competitive Learning Graph Quantization 

This section briefly introduces competitive learning as a design technique for 
graph quantization. For details on graph quantization and consistency results of 
competitive learning, we refer to [22]. 

2.1 Metric Graph Spaces 

Let E be a r-dimensional Euclidean vector space. An (attributed) graph is a triple 
X = (V, E, a) consisting of a finite nonempty set V of vertices, a set E C V x V 
of edges, and an attribute function a : V xV ^ E, such that a(i,j) = for each 
pair of distinct vertices i,j with ^ E. 

For simplifying the mathematical treatment, we assume that all graphs are of 
order n, where n is chosen to be sufficiently large. Graphs of order less than n, say 
m < n, can be extended to order n by including isolated vertices with attribute 
zero. For practical issues, it is important to note that limiting the maximum order 
to some arbitrarily large number n and extending smaller graphs to graphs of 



order n are purely technical assumptions to simplify mathematics. For machine 
learning problems, these limitations should have no practical impact, because 
neither the bound n needs to be specified explicitly nor an extension of all 
graphs to an identical order needs to be performed. When applying the theory, 
all we actually require is that the graphs are finite. 

A graph X is completely specified by its matrix representation X = (xij) 
with elements = a(i,j) for all 1 < i,j < n. By concatenating the columns of 
X, we obtain a vector representation x of X. 

Let X = E nx " be the Euclidean space of all (n x n)-matrices and let T denote 
a subgroup of the set V n of all (n x n)-permutation matrices. Two matrices 
X G X and X' e X are said to be equivalent, if there is a permutation matrix 
PeT such that P T XP = X' . The quotient set 

X T = X/T={[X] : Xe X} 

is a graph space of all abstract graphs [X] over the representation space X induced 
by the transformation group T ■ Note that the graph space Xj- is a Riemannian 
orbifold. The notion of orbifold is fundamental to extend analytical and geo- 
metrical properties of Euclidean spaces to graph spaces. For details, we refer to 
[22]. 

In the remainder of this contribution, we identify X with (TV = n 2 ) 
and consider vector- rather than matrix representations of abstract graphs. By 
abuse of notation, we sometimes identify X with [a;] and write x G X instead of 
x e [x\. We say, X lifts to x if we represent graph X by vector x e X. 

Finally, we equip our graph space with a metric. Let ||-|| be a Euclidean norm 
on X. Then the distance function 

d(X,Y) =min{||aj-y|| : xeX,yeY}, 

is a metric. It is well known that calculating the graph distance metric d(X, Y) 
is a NP-complete graph matching problem [11]. 
Since T is a group, we have 

d y (X)=mm{\\x-y\\ : x e X} = d(X, Y), 

where y is an arbitrary vector representation of Y. By symmetry, we similarly 
have d x (Y) = d(Y,X). Hence, the graph distance d(X,Y) can be determined 
by fixing an arbitrary vector representation y e Y and then finding a vector 
representation from X that minimizes ||x — y\\ over all vector representations x 
from X. 

A pair (x,y) e X x Y of vector representations is called optimal alignment 
if \\x — y\\ = d(X,Y). Thus, we have d(X, Y) < \\x — y\\ for all vector repre- 
sentations x G X and y G Y, where equality holds if and only if x and y are 
optimally aligned. 



Algorithm 1: Competitive Learning GQ 

01 choose an initial codebook C = {Yi, . . . , Yk} C Xj- 

02 choose arbitrary vector representations y\ E Yi , . . . , yu E Y*, 

03 repeat 

04 randomly select an input graph X E Xj- 

05 let Y x = argminy 6C d(X, Y) 2 

06 choose x G X optimally aligned with y G Yx 

07 determine learning rate r)i > 

08 update y = y + rj(x — y) 

09 until some termination criterion is satisfied 



2.2 Graph Quantization 

Let (Xj-,d) be a metric graph space over the Euclidean space (X, ||-||). A graph 
quantizer of size k is a mapping of the form 

Q:X T ^ y T , 

where C = {Y\, . . . , Yfe} C A7- is a codebook. The elements Yj E 3^r are the co ^ e 
graphs. 

Adaptive graph quantization design aims at finding a codebook C = {Yi, . . . , Y k } C 
A7- such that the expected distortion 



D(C)=J2[ mind(X,Y j ff(X)dX 



is minimized, where C E and / = fx T is a probability density defined on 
some measurable space (Xj-, Sx T )- 

A statistically consistent method to minimize the expected distortion using 
empirical data is competitive learning as outlined in Algorithm 1. 



3 Accelerating Competitive Learning GQ 

This section proposes an acceleration of competitive learning GQ that is based 
on a fixed training set S = {X\, . . . ,-Xjv} consisting of N independent graphs 
Xi drawn from Xp. At each cycle through the training set, competitive learning 
GQ as described in Algorithm 1 calculates kN graph distances, each of which 
is NP-hard. These distance calculations predominate the computational cost of 
Algorithm 1. Accelerating competitive learning GQ therefore aims at reducing 
the number of graph distance calculations. 

Accelerating competitive learning GQ is based on the key idea that a small 
change of a vector representation y E Y of a code graph Y £ C does not change 



already optimally aligned vector representations x <G X of training graphs X E S 
quantized by (closest to) Y. When keeping track of the most recent optimal 
alignment, calculating graph distances reduces to calculating Euclidean distances 
of vector representations, in particular in the final convergence phase. Thus, in 
order to avoid graph distance calculations, we lift graphs to their optimally 
aligned vector representations — if possible — and switch to Euclidean distance 
calculations. 

To describe the accelerated version of Algorithm 1, we assume that X G S is 
a training graph and Y,Y X € C are code graphs. By Y x = Q(X) we denote the 
encoding of X. Since the graph distance d is a metric, we have 

u{X) < l(X,Y) => d(X,Y x ) < d(X,Y), (1) 

where 

1. u(X) > d(X, Yx) denotes an upper bound of d(X, Yx) and 

2. l(X, Y) < d(X, Y) denotes a lower bound of d(X, Y). 

From Eqn. (1) follows that we can avoid calculating a graph distance d(X, Y) if 
at least one of the following two conditions is satisfied 

(d) Y = Y x 

(C 2 ) u(X)<l(X,Y). 

A code graph Y is a candidate encoding for X if both conditions (Ci) and (C2) 
are violated. In this case, we apply the technique of delayed distance evaluation. 
For this, we proceed as follows: 

1. We first test whether the upper bound u(X) is out-of-date. An upper bound 
is out-of-date if u(X) ^ d(X, Yx). Note that we can check the state of u(X) 
without calculating the distance d(X, Yx). 

2. If u(X) is out-of-date, we calculate the distance d(X,Yx) and set u(X) = 
d(X, Yx). Since improving the upper bound u(X) might eliminate Y as being 
a candidate encoding for X, we recheck condition (C2). 

3. If condition (C2) is still violated despite the updated upper bound u(X), we 
have the following situation 

u(X) = d{X,Y x ) > l(X,Y). 

We update the lower bound by calculating l(X, Y) = d(X 7 Y) and then re- 
examine condition (C 2 ). 

4. If condition (C 2 ) remains violated, we have 

u(X) = d(X, Y x ) > d(X, Y) = 1{X, Y). 

This implies that X is closer to code graph Y than to its current encoding 
Yx- In this case Y becomes the new encoding of X. 



Algorithm 2: Accelerated Competitive Learning GQ 



01 choose an initial codebook C = {Yi, . . . , Yk} C X-y 

02 choose arbitrary vector representations yi 6 Y\ , ■ ■ ■ , yk £ Yk 

03 set u(X) = oo and declare u(X) out-of-date for all X G S 

04 repeat 

05 store Y in Y' for all Y G C 

06 repeat 

07 randomly select X G 5 

08 Fx = classify(X) 

09 determine learning rate r/ > 

10 set y = y + r\ (x a - y) 

11 ESTIMATE BOUNDS () 

12 until some termination criterion is satisfied 



Algorithm 3: classify(X) 

01 for each Y G C do 

02 if Y is a candidate encoding for X 

03 if it (X) is out-of-date 

04 UPDATE_BOUNDS(X, Y) 

05 if Y is still a candidate encoding for X 

06 if d(X, Y) < u (X) 

07 UPDATE_BOUNDS(X, Y) 

08 set Y x = Y 

09 return Y x 



Algorithm 4: update_bounds(X, Y) 

01 set u(X) = d(X,Y x ) 

02 set l(X,Y x ) = d(X,Y x ) 

03 declare u(X) as up-to-date 

04 store x a G X with d(X,Y x ) = \\x a - y\\ 



Algorithm 5: estimate boundsQ 

01 compute S(Y) = d(Y, Y') for all Y G C 

02 set m(X) = mm{u(X) + S(Yx), \\x a - y\\} for all X e X T 

03 declare m(-X') as out-of-date for all X G Xr 

04 set /(X, Y) = max {l(X, Y') - <5 (Y), 0} for all Y G Xj- and for all Y G C 



Otherwise, if at least one of both conditions (Ci) and (C2) is satisfied or becomes 
satisfied during examination, Y x remains the encoding of X. 

Crucial for avoiding NP-hard graph distance calculations are good estimates 
of the lower and upper bounds l(X,Y) and u(X) after each cycle through the 
training set S. For this, we keep record of the most recent optimal alignment of 
an input graph X and its current code graph Yx- Suppose that y is a vector 
representation of Yx during a cycle through the training set. Each time we 
calculate a distance d(X, Yx), we obtain a vector representation x € X such 
that (x, y) is an optimal alignment of X x Yx - By x a € X we denote the vector 
representation of X of an optimal alignment obtained by the most recent distance 
calculation d(X, Yx)- Updating the bounds is then carried out as follows: After 
each cycle through the training set S, we compute the change S(Y) of each 
centroid Y by the distance 

6(Y) = d(Y,Y'), 

where Y' is the code graph before the t-th cycle through the training set S and 
Y is the current code graph after the t-th cycle through S. Based on the triangle 
inequality of d, we set the bounds according to the following rules: 

l(X,Y) =m&x{l(X,Y , )-5(Y),0} (2) 
u(X) = mm{u(X) + S (Y x ), \\x a - y\\}, (3) 

Both rules guarantee that l(X, Y) is always a lower bound of d(X, Y) and u(X) is 
always an upper bound of d(X, Yx ) ■ Note that calculating an Euclidean distance 
and the computational overhead of storing aligned vector representations x a of 
each training graph X is computationally negligible compared to calculating the 
NP-hard graph distance d. 

Finally, we declare upper bounds as out-of-date if 

S(Y) = d(Y, Y') > 9 

where 9 > is some prespecified control parameter that trades accuracy against 
speed. This is motivated by the following considerations. For sake of simplicity, 
we assume that 9 — 0. From 

6(Y) = d(Y, Y') = 

follows that the current and the previous code graph are identical. This implies 
that an optimal alignment {x, y') of X x Y' x is also an optimal alignment {x, y) 
of X x Yx for all training graphs X encoded by Yx- According to eqn. (3) the 
upper bounds are then of the form 

u(X) = \\x-y\\=d(X,Y x ). 

As a consequence, we may declare u(X) as up-to-date rather then out-of-date. 
This makes a delayed distance evaluation of d(X, Yx) unnecessary in the next 



iteration. To further accelerate competitive learning GQ, we can generalize this 
idea for small changes 

5(Y) = d(Y,Y') < 0. 

The underlying assumption of this heuristic is that the more similar the previous 
and the recomputed code graphs are, the more likely is an optimal alignment 
(x, y') of X x Y' x also an optimal alignment (x, y) of X x Yx- In fact, we have 

d(X,Y x ) < ||as — V || < \\x-y'\\ + 6 = d(X,Y x ) + 6 

showing that the upper bound \\x — y\\ deviates at most by 9 from d(X, Yx). 

Algorithm 2 describes the accelerated version of competitive learning GQ. 
The accelerated version calls the subroutines classify, update_BOUNDS, and 
estimate bounds described in Algorithm 3, 4, and 5 respectively. The subrou- 
tine CLASSIFY determines the encoding of the current training graph by applying 
the principle of delayed distance evaluation. The subroutine UPDATE_BOUNDS is 
an auxiliary subroutine for updating the bounds and keeping book of the most 
recent optimal alignment of the current training graph and its encoding. Fi- 
nally, estimate BOUNDS is a subroutine that re-estimates the lower and upper 

bounds after each cycle through the training set. 

4 Experiments 

This section reports the results of running standard competitive learning GQ 
and its accelerated version. 

4.1 Data. 

We selected four data sets described in [27]. The data sets are publicly available 
at [16]. Each data set is divided into a training, validation, and a test set. In 
all four cases, we considered data from the test set only. The description of the 
data sets are mainly excerpts from [27] . Table 1 provides a summary of the main 
characteristics of the data sets. 



data set 


# (graphs) 


# (classes) avj 


;(nodes) 


max(nodes) avg(edges) 


max(edges) 


letter 


750 


15 


4.7 


8 3.1 


6 


grec 


528 


22 


11.5 


24 11.9 


29 


fingerprint 


900 


3 


8.3 


26 14.1 


48 


molecules 


100 


2 


24.6 


40 25.2 


44 



Table 1. Summary of main characteristics of the data sets. 



A A 



Fig. 1. Example of letter drawings: Prototype of letter A and distorted copies generated 
by imposing low, medium, and high distortion (from left to right) on prototype A. 

Letter Graphs. We consider all 750 graphs from the test data set representing 
distorted letter drawings from the Roman alphabet that consist of straight lines 
only (A, E, F, H, I, K, L, M, N, T, V, W, X, Y, Z). The graphs are uniformly 
distributed over the 15 classes (letters). The letter drawings are obtained by dis- 
torting prototype letters at low distortion level. Lines of a letter are represented 
by edges and ending points of lines by vertices. Each vertex is labeled with a 
two-dimensional vector giving the position of its end point relative to a reference 
coordinate system. Edges are labeled with weight 1. Figure 1 shows a prototype 
letter and distorted version at various distortion levels. 

GREC Graphs. The GREC data set [4] consists of graphs representing symbols 
from architectural and electronic drawings. We use all 528 graphs from the test 
data set uniformly distributed over 22 classes. The images occur at five different 
distortion levels. In Figure 2 for each distortion level one example of a draw- 
ing is given. Depending on the distortion level, either erosion, dilation, or other 
morphological operations are applied. The result is thinned to obtain lines of 
one pixel width. Finally, graphs are extracted from the resulting denoised im- 
ages by tracing the lines from end to end and detecting intersections as well 
as corners. Ending points, corners, intersections and circles are represented by 
vertices and labeled with a two-dimensional attribute giving their position. The 
vertices are connected by undirected edges which are labeled as line or arc. An 
additional attribute specifies the angle with respect to the horizontal direction 
or the diameter in case of arcs. 



Fingerprint Graphs. We consider a subset of 900 graphs from the test data set 
representing fingerprint images of the NIST-4 database [32] . The graphs are uni- 
formly distributed over three classes left, right, and whorl. A fourth class (arch) 
is excluded in order to keep the data set balanced. Fingerprint images are con- 
verted into graphs by filtering the images and extracting regions that are relevant 




Fig. 2. GREC symbols: A sample image of each distortion level 




Fig. 3. Fingerprints: (a) Left (b) Right (c) Arch (d) Whorl. Fingerprints of class arch 
are not considered. 

[26] . Relevant regions are binarized and a noise removal and thinning procedure 
is applied. This results in a skeletonized representation of the extracted regions. 
Ending points and bifurcation points of the skeletonized regions are represented 
by vertices. Additional vertices are inserted in regular intervals between ending 
points and bifurcation points. Finally, undirected edges are inserted to link ver- 
tices that are directly connected through a ridge in the skeleton. Each vertex is 
labeled with a two-dimensional attribute giving its position. Edges are attributed 
with an angle denoting the orientation of the edge with respect to the horizontal 
direction. Figure 3 shows fingerprints of each class. 

Molecules. The mutagenicity data set consists of chemical molecules from two 
classes (mutagen, non-mutagen) . The data set was originally compiled by [24] 
and reprocessed by [27]. We consider a subset of 100 molecules from the test data 
set uniformly distributed over both classes. We describe molecules by graphs in 
the usual way: atoms are represented by vertices labeled with the atom type 
of the corresponding atom and bonds between atoms are represented by edges 
labeled with the valence of the corresponding bonds. We used a 1-to-fc binary 
encoding for representing atom types and valence of bonds, respectively. 

4.2 General Experimental Setup 

In all experiments, we applied standard competitive learning GQ (std) and its 
accelerated version (acc) to the aforementioned data sets by using the following 
experimental setup: 

Setting of competitive learning GQ algorithms. To initialize the standard and 
accelerated competitive learning GQ algorithm, we used a modified version of 
the "furthest first" heuristic [15] . For each training set S, the first code graph 
Yi is initialized to be a graph closest to the sample mean of S (see [19, 20] for 
details on computing the sample mean of graphs). Subsequent code graphs are 
initialized according to 

Y i+ i = argmax min d(X, Y), 

where d is the set of the first i centroids chosen so far. We terminated both 
competitive learning GQ algorithms after 150 cycles through the training set. 



Graph distance calculations and optimal alignment. For graph distance calcula- 
tions and finding optimal alignments, we applied a depth first search algorithm 
on the letter data set and the graduated assignment [11] on the grec, fingerprint, 
and molecule data set. The depth first search method guarantees to return op- 
timal solutions and therefore can be applied to small graphs only. Graduated 
assignment returns approximate solutions. 

Performance measures. We used the following measures to assess the perfor- 
mance of an algorithm on a dataset: (1) empirical distortion, (2) classification 
accuracy, (3) silhouette index, and (4) number of graph distance calculations. 
The empirical distortion is here given by 



The silhouette index is a cluster validation index taking values from [—1,1]. 
Higher values indicate a more compact and well separated cluster structure. For 
more details we refer to Appendix A and [30]. 

Accelerated competitive learning GQ incurs computational overhead to cre- 
ate and update auxiliary data structures and to compute Euclidean distances. 
This overhead is negligible compared to the time spent on graph distance calcu- 
lations. Therefore, we report number of graph distance calculations rather than 
clock times as a performance measure for speed. 

4.3 Performance Comparison 

We applied both competitive learning GQ algorithms to all four data sets in 
order to assess and compare their performance. The control parameter 8 of the 
accelerated version is set to zero. For each data set 10 runs of each algorithm 
were performed and the best result with respect to the error value (empirical 
distortion) is selected. For taking the average, 10 runs are too less in order to 
provide statistically significant results. For conducting a higher number of runs 
in order to obtain statistical significant results, computational resources were 
not sufficient. 

Table 2 summarizes the results. The first observation to be made is that 
the solution quality of standard and accelerated competitive learning GQ is 
comparable with respect to error, classification accuracy, and silhouette index. 
Deviations are due to the non-uniqueness of the solutions and the approximation 
errors of the graduated assignment algorithm. The second observation to be made 
from Table 2 is that the accelerated version outperforms standard competitive 
learning GQ with respect to computation time on all data set. 

The results show that the speedup factor increases with increasing number 
k of codebook graphs but obviously is independent of the average cluster size 
N/k. Contrasting the silhouette index and the dimensionality of the data to 
the speedup factor gained by accelerated competitive learning GQ, we make 
the following observation: First, the silhouette index for the letter, grec, and 
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Table 2. Results of standard competitive learning GQ (std) and accelerated compet- 
itive learning GQ (acc) on four data sets. Rows labeled matchings give the number 
of graph distance calculations, and rows labeled speedup show how many times an 
algorithm is faster than standard competitive learning GQ for graphs. 



fingerprint data set are roughly comparable and indicate a cluster structure in the 
data, whereas the silhouette index for the molecule data set indicates almost no 
compact and homogeneous cluster structure. Second, the dimensionality of the 
vector representations is largest for molecule graphs, moderate for grec graphs, 
and relatively low for letter and fingerprint graphs. Thus, the speedup factor 
of accelerated competitive learning GQ apparently decreases with increasing 
dimensionality and decreasing cluster structure. This behavior is in line with 
Elkan's k-means for vectors [6] and Elkan's k-means for graphs [23] as well as 
with findings in high-dimensional vector spaces. According to [25] , there will be 
little or no acceleration in high dimensions if there is no underlying structure in 
the data. This view is also supported by theoretical results from computational 
geometry [17]. 

Figure 4 shows how the number of graph distance calculations of accelerated 
competitive learning GQ decreases with increasing number of cycles through the 
training set. The standard version of competitive learning GQ requires kN graph 
distance calculations at each cycle corresponding to 100%. For the letter, grec, 
and fingerprint data set, the plot shows that after a few cycles more than 80% out 
of kN graph distance calculations can be avoided and replaced by calculations 
of simple Euclidean distances. Accelerated competitive learning GQ operates in 
both spaces, the Euclidean graph space and its underlying Euclidean representa- 
tion space. The graph space is used for choosing vector representations optimally 
aligned to their code graphs. Given optimal aligned vector representations, com- 
petitive learning GQ reduces to competitive learning VQ in the representation 
space. Switching back to the graph space serves as correcting the optimal align- 
ments between the vector representations of the input and their closest code 
graphs. As the plot shows, the corrective function of the graph space is less 
required with increasing number of cycles through the training set. 

5 Conclusion 

Accelerated competitive learning GQ avoids graph distance calculations by ex- 
ploiting the metric properties of the graph distance and by lifting input graphs 
from the training set to vector representations that are optimally aligned to 
their encodings. In doing so, accelerated competitive learning GQ switches be- 
tween two spaces, the graph space and its underlying Euclidean representation 
space. The competitive learning in the graph space serves as a corrective of the 
fast but erroneous counterpart in the representation space. With increasing time 
competitive learning for GQ gradually turns into competitive learning for VQ 
without loss of solution accuracy. In particular, during the long final conver- 
gence phase most graph distance calculations can be avoided and replaced by 
Euclidean distances, provided the data possesses a cluster structure. We conclude 
that accelerated competitive learning GQ is a first step to avoid intractable graph 
distance calculations without loss of structural information. We believe that the 
proposed acceleration can also be applied to GQ using more general graph edit 
distance metrics as underlying distortion measures. As a next step, future work 
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Fig. 4. Percentage of graph distances calculated by accelerated competitive learning 
GQ at each cycle through the training set. The standard version of competitive learning 
GQ requires 100% (corresponding to kN) graph distance calculations at each cycle. The 
number k for each training set is shown in parentheses next to the identifiers of the 
data sets in the legend box. 

is concerned with an empirical investigation of the lifting threshold 9 to trade 
solution quality against speed. 

A The Silhouette Index 

Suppose that S — {X\ , . . . , X m } is a sample of m patterns. Let C = {C\ , . . . , Ck} 
be a partition of S consisting of k disjoint clusters with 



We assume that D is the underlying distance function defined on S. The distance 
between two subsets U, U' C S is defined by 



If U = {X} consists of a singleton, we simply write D (X,W) instead of D ({X}, 14'). 
Let 



denote the average distance between pattern Ie5 and subset WC5. Suppose 
that pattern JQ G S is a member of cluster C m ^ G C. By C',^ we denote the 
set C m (j) \ {^j}- For each pattern Xi G S let 



k 



s = \Jc t . 



i=l 



D{U,U') 



min {D (X, X') : X G U, X' G U'} . 




be the average distance between pattern Xi and subset C' m ^y By 

6, = min D avg (Xi,Cj) 

we denote the minimum average distance between pattern Xi and all clusters 
from C not containing JQ. The silhouette width of Xi is defined as 

bi - a { 

■% = 



max (bi, a t ) 
The silhouette of cluster Cj € C is given by 

S i = \C~\ Sj - 
The silhouette index is then defined as the average of all cluster silhouettes 



1 k 

6 = JE S - 
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